Linear Ensembles of Word Embedding Models

نویسندگان

Avo Muromägi

Kairit Sirts

Sven Laur

چکیده

This paper explores linear methods for combining several word embedding models into an ensemble. We construct the combined models using an iterative method based on either ordinary least squares regression or the solution to the orthogonal Procrustes problem. We evaluate the proposed approaches on Estonian—a morphologically complex language, for which the available corpora for training word embeddings are relatively small. We compare both combined models with each other and with the input word embedding models using synonym and analogy tests. The results show that while using the ordinary least squares regression performs poorly in our experiments, using orthogonal Procrustes to combine several word embedding models into an ensemble model leads to 7-10% relative improvements over the mean result of the initial models in synonym tests and 19-47% in analogy tests.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The embedding method to obtain the solution of fuzzy linear systems

In this paper, we investigate the general fuzzy linear system of equations. The main aim of this paper is based on the embedding approach. We ﬁnd the necessary and sufﬁcient conditions for the existence of fuzzy solution of the mentioned systems. Finally, Numerical examples are presented to more illustration of the proposed model.

متن کامل

Learning to Negate Adjectives with Bilinear Models

We learn a mapping that negates adjectives by predicting an adjective’s antonym in an arbitrary word embedding model. We show that both linear models and neural networks improve on this task when they have access to a vector representing the semantic domain of the input word, e.g. a centroid of temperature words when predicting the antonym of ‘cold’. We introduce a continuous class-conditional ...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Revisiting Embedding Features for Simple Semi-supervised Learning

Recent work has shown success in using continuous word embeddings learned from unlabeled data as features to improve supervised NLP systems, which is regarded as a simple semi-supervised learning mechanism. However, fundamental problems on effectively incorporating the word embedding features within the framework of linear models remain. In this study, we investigate and analyze three different...

متن کامل

Learning Word Meta-Embeddings by Using Ensembles of Embedding Sets

Word embeddings – distributed representations of words – in deep learning are beneficial for many tasks in natural language processing (NLP). However, different embedding sets vary greatly in quality and characteristics of the captured semantics. Instead of relying on a more advanced algorithm for embedding learning, this paper proposes an ensemble approach of combining different public embeddi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Linear Ensembles of Word Embedding Models

نویسندگان

چکیده

منابع مشابه

The embedding method to obtain the solution of fuzzy linear systems

Learning to Negate Adjectives with Bilinear Models

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Revisiting Embedding Features for Simple Semi-supervised Learning

Learning Word Meta-Embeddings by Using Ensembles of Embedding Sets

عنوان ژورنال:

اشتراک گذاری